1,574 research outputs found

    Developing performance-portable molecular dynamics kernels in Open CL

    Get PDF
    This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation kernel are the same across these architectures, and detail a number of platform- agnostic optimisations that improve its performance by at least 2x on all hardware considered. Our complete code is shown to be 1.7x faster than the original miniMD, and at most 2x slower than implementations individually hand-tuned for a specific architecture

    Principled polycentrism and resourceful speakers

    Full text link
    ©2014 Asian Association of Teachers of English as a Foreign Language.All rights reserved. A central goal of language education is the development of resourceful speakers, people who have both good access to a range of linguistic resources and are good at shifting between styles, discourses, registers and genres. Communication becomes possible not because we adhere to global or even regional norms, but because language users are able to bring their communication into alignment with each other. Drawing on a series of studies of both online and face-to-face interaction in different cities in Asia, this paper suggests that to understand communication in contexts of diversity, we need to focus less on a supposed shared code and more on the interactions among language resources, activities and space. This in turn suggests that in order to pursue intelligibility in multilingual contexts we need a model of principled polycentrism, not the polycentrism of a World Englishes focus, with its established norms of regional varieties of English, nor the reduced communicative domain of the English as a lingua franca framework, but a more fluid yet principled approach to the diversity of contemporary contexts of communication

    Market Lingos and Metrolingua Francas

    Full text link
    © , Copyright © Taylor & Francis Group, LLC. Drawing on data recorded in two city markets, this article analyzes the language practices of workers and customers as they go about their daily business, with a particular focus on the ways in which linguistic resources, everyday tasks, and social spaces are intertwined in producing metrolingua francas. The aim of the article is to come to a better understanding of the relationships among the use of diverse linguistic resources (drawn from different languages, varieties, and registers), the repertoires of the workers, the activities in which they are engaged, and the larger space in which this occurs. Developing the idea of spatial repertoires as the linguistic resources available in particular places, we explore the ways in which metrolingua francas (metrolingual multilingua francas) emerge from the spatial resources of such markets

    Swimming with sharks, ecological feminism and posthuman language politics

    Full text link
    Copyright © Taylor & Francis Group, LLC. The critical project the authors propose overturns the assumptions of human centrality that have underpinned much educational thought and practice, questions the ways in which the human and nonhuman are defined, and opens up new forms of engagement with the material, corporeal, and affective world. The authors ask how critical language studies can be rethought to incorporate a better understanding of the place of humans in the more-than-human world. They discuss the growing body of work that connects concern with the environment with other forms of political activism, particularly through an ecological feminist lens. Bringing this discussion back to focus on the place of language and pedagogy in human exceptionalism, the authors explore ways in which alternative understandings of human relations to the more-than-human material world can reorient the logocentricity of critical language studies toward different forms of critical engagement and entangled pedagogies

    Lingoing and everyday metrolingual metalanguage

    Full text link
    While the ways in which people talk about their everyday language use suggest that they live in a languagised world (a world in which language labels and enumerations are the common stuff of everyday language talk), their understanding of what those language labels mean may be both diverse and flexible. It is important not to make top-down assumptions about the meanings behind language labels. In this paper we are interested in the metrolingual metalanguage people use to describe everyday language use. This is not a question of a disjuncture between a delanguagised realm of academic analysis (such as the recent move towards translingual terminology) and a languagised realm of everyday metalanguage (where languages are named and labelled along normative lines), but rather a call to make visible what lies beneath such everyday terms and linguistic labels. Through an analysis of various discussions of everyday language use, we argue that although people often appear to talk in terms of fixed languages, such accounts are often flexible, negotiable and contestable. This is not therefore best understood in terms of a polarity between fixity and fluidity but rather as a flexible array of entangled language ideologies

    Towards a portable and future-proof particle-in-cell plasma physics code

    Get PDF
    We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on understanding the suitability of existing algorithms for future accelerator-based architectures, and identifying the changes necessary to achieve performance portability for particle-in-cell plasma physics codes. We achieve good levels of performance with limited changes to the algorithmic behaviour of the code. However, our results suggest that a fundamental change to EPOCH3D’s current accumulation step (and its dependency on atomic operations) is necessary in order to fully utilise the massive levels of parallelism supported by emerging parallel architectures

    Experiences with porting and modelling wavefront algorithms on many-core architectures

    Get PDF
    We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA). This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC). In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale

    Parallelising wavefront applications on general-purpose GPU devices

    Get PDF
    Pipelined wavefront applications form a large portion of the high performance scientific computing workloads at supercomputing centres. This paper investigates the viability of graphics processing units (GPUs) for the acceleration of these codes, using NVIDIA's Compute Unified Device Architecture (CUDA). We identify the optimisations suitable for this new architecture and quantify the characteristics of those wavefront codes that are likely to experience speedups

    The role of answer fluency and perceptual fluency in the monitoring and control of reasoning: Reply to Alter, Oppenheimer, and Epley

    Get PDF
    In this reply, we provide an analysis of Alter et al. (2013) response to our earlier paper (Thompson et al., 2013). In that paper, we reported difficulty in replicating Alter, Oppenheimer, Epley, and Eyre’s (2007) main finding, namely that a sense of disfluency produced by making stimuli difficult to perceive, increased accuracy on a variety of reasoning tasks. Alter, Oppenheimer, and Epley (2013) argue that we misunderstood the meaning of accuracy on these tasks, a claim that we reject. We argue and provide evidence that the tasks were not too difficult for our populations (such that no amount of “metacognitive unease” would promote correct responding) and point out that in many cases performance on our tasks was well above chance or on a par with Alter et al.’s (2007) participants. Finally, we reiterate our claim that the distinction between answer fluency (the ease with which an answer comes to mind) and perceptual fluency (the ease with which a problem can be read) is genuine, and argue that Thompson et al. (2013) provided evidence that these are distinct factors that have different downstream effects on cognitive processe

    An investigation of the performance portability of OpenCL

    Get PDF
    This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation
    corecore